Introduction

It’s extremely exciting to analyse Spotify data! I’ve decided to analyse the Shazam playlist that Spotify created for me hoping it will give the most accurate insight into my taste in music.

I’d like to firstly figure out if I have favourite artists I listen to the most. Then I plan to have a look at the evolution of songs in terms of their energy level contrasted with their popularity scores.

# reading the playlist data from working directory
spotify_data <- fromJSON("/Users/nevyildirim/Desktop/Stats220/Stats220/Assignment4/spotify.json")

#doing preliminary analysis of the data
spotify_data %>% glimpse()
## Rows: 60
## Columns: 23
## $ playlist_id      <chr> "039eedcSC7R0SzmfHujnd8", "039eedcSC7R0SzmfHujnd8", "…
## $ playlist_name    <chr> "My Shazam Tracks", "My Shazam Tracks", "My Shazam Tr…
## $ track_id         <chr> "00CGiXdQ9ObQow7jwbSfLv", "01iuzK2ziYAVE2n9aSXbWp", "…
## $ track_name       <chr> "Sad Moments", "Hollywood's Bleeding", "Low (feat. T-…
## $ track_popularity <int> 0, 0, 79, 53, 52, 0, 62, 45, 24, 74, 0, 67, 32, 61, 4…
## $ track_album_name <chr> "Moments", "Hollywood's Bleeding", "Mail on Sunday", …
## $ release_date     <chr> "2018-04-02", "2019-09-06", "2008-03-17", "2013-08-13…
## $ artist_name      <chr> "Eugen Menjaev", "Post Malone", "Flo Rida, T-Pain", "…
## $ danceability     <dbl> 0.507, 0.404, 0.918, 0.643, 0.798, 0.767, 0.961, 0.30…
## $ energy           <dbl> 0.9300, 0.6450, 0.6090, 0.6470, 0.6970, 0.8910, 0.461…
## $ key_name         <chr> "C#", "E", "A#", "C", "A#", "C", "A", "E", "A", "A#",…
## $ loudness         <dbl> -4.532, -3.221, -5.640, -9.061, -4.803, -3.010, -8.68…
## $ mode_name        <chr> "minor", "minor", "minor", "major", "minor", "major",…
## $ speechiness      <dbl> 0.3270, 0.0479, 0.0791, 0.0795, 0.2990, 0.0901, 0.270…
## $ acousticness     <dbl> 0.0004, 0.3310, 0.0928, 0.3690, 0.0823, 0.0062, 0.202…
## $ instrumentalness <dbl> 8.16e-01, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.0…
## $ liveness         <dbl> 0.1110, 0.1040, 0.1390, 0.3340, 0.2730, 0.4060, 0.162…
## $ valence          <dbl> 0.1560, 0.1580, 0.3040, 0.6320, 0.5580, 0.8240, 0.324…
## $ tempo            <dbl> 160.009, 130.215, 128.008, 76.948, 164.910, 119.910, …
## $ time_signature   <int> 4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ duration_ms      <int> 411011, 156253, 231400, 185480, 189501, 195333, 20764…
## $ explicit         <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, …
## $ artist_genre     <chr> NA, "dfw rap, melodic rap, rap", "dance pop, miami hi…
head(spotify_data)
##              playlist_id    playlist_name               track_id
## 1 039eedcSC7R0SzmfHujnd8 My Shazam Tracks 00CGiXdQ9ObQow7jwbSfLv
## 2 039eedcSC7R0SzmfHujnd8 My Shazam Tracks 01iuzK2ziYAVE2n9aSXbWp
## 3 039eedcSC7R0SzmfHujnd8 My Shazam Tracks 0CAfXk7DXMnon4gLudAp7J
## 4 039eedcSC7R0SzmfHujnd8 My Shazam Tracks 0e3aRkhcCdkYN62p2PFfD3
## 5 039eedcSC7R0SzmfHujnd8 My Shazam Tracks 0ldWh5GwkxnhjoUdGKzXEq
## 6 039eedcSC7R0SzmfHujnd8 My Shazam Tracks 0onslUNmSLDkuYaideYWir
##                      track_name track_popularity     track_album_name
## 1                   Sad Moments                0              Moments
## 2          Hollywood's Bleeding                0 Hollywood's Bleeding
## 3            Low (feat. T-Pain)               79       Mail on Sunday
## 4                          24/7               53       Summer Anthems
## 5 Backin' It Up (feat. Cardi B)               52             UNDER8ED
## 6                         Juice                0                Juice
##   release_date                artist_name danceability energy key_name loudness
## 1   2018-04-02              Eugen Menjaev        0.507  0.930       C#   -4.532
## 2   2019-09-06                Post Malone        0.404  0.645        E   -3.221
## 3   2008-03-17           Flo Rida, T-Pain        0.918  0.609       A#   -5.640
## 4   2013-08-13               Common Kings        0.643  0.647        C   -9.061
## 5   2019-11-15 Pardison Fontaine, Cardi B        0.798  0.697       A#   -4.803
## 6   2019-01-04                      Lizzo        0.767  0.891        C   -3.010
##   mode_name speechiness acousticness instrumentalness liveness valence   tempo
## 1     minor      0.3270       0.0004            0.816    0.111   0.156 160.009
## 2     minor      0.0479       0.3310            0.000    0.104   0.158 130.215
## 3     minor      0.0791       0.0928            0.000    0.139   0.304 128.008
## 4     major      0.0795       0.3690            0.000    0.334   0.632  76.948
## 5     minor      0.2990       0.0823            0.000    0.273   0.558 164.910
## 6     major      0.0901       0.0062            0.000    0.406   0.824 119.910
##   time_signature duration_ms explicit
## 1              4      411011    FALSE
## 2              4      156253    FALSE
## 3              4      231400    FALSE
## 4              4      185480    FALSE
## 5              4      189501     TRUE
## 6              4      195333     TRUE
##                                                                                                               artist_genre
## 1                                                                                                                     <NA>
## 2                                                                                                dfw rap, melodic rap, rap
## 3 dance pop, miami hip hop, pop, pop rap, dance pop, hip hop, pop, pop rap, r&b, rap, southern hip hop, urban contemporary
## 4                                                                                            jawaiian, pacific islands pop
## 5                                                                        rap, upstate ny rap, dance pop, pop, pop rap, rap
## 6                                                               dance pop, escape room, minnesota hip hop, pop, trap queen

We observe the data consists of 60 rows (or 60 songs) and 23 columns.

Visualisation 1 - Favourite Artists

Firstly I’d like to look into artists in the playlist. Is there any specific artist/artists that I listen to more than others?

Some of the cells have multiple values for artist_name column. We’ll need to separate them into atomic cells.

# separating artists in a new data frame, counting songs by each artist and sorting
# from highest to lowest 
separated_spotify_data <- spotify_data %>% 
  separate_rows(artist_name, sep = ", ") %>%
  group_by(artist_name) %>%
  summarise(num_songs = n()) %>%
  arrange(desc(num_songs)) %>%
  filter(num_songs >= 2)

# plotting the result on a bar chart
separated_spotify_data %>% ggplot() +
  geom_col(aes(y=artist_name, x=num_songs), fill = "#6a0dad") +
  labs(x = "Number of songs", y = "Artist name",
       title ="Do I prefer any specific artist to others?",
       subtitle = "Let's find out if I tend to listen to more songs by a particular artist/artists",
       caption = "Source: Spotify data") +
  theme(axis.text.y = element_text(angle = 45))

I have filtered the data to artists who appears at least twice on the playlist. Turns out, I do have favourite artists. Interesting!

Visualisation 2 - Track Popularity and Energy over the years

In the next analysis, I’d like to look into the relationship between track popularity and energy values of tracks and what the trend looks like over the years.

# creating a new date variable to look into years
# and sorting the years ascending
spotify_data <- spotify_data %>%
  mutate(year_published = str_sub(release_date, 1, 4) %>% as.numeric()) %>%
  arrange(year_published)

new_plot <- ggplot(spotify_data, 
  aes(x = track_popularity, y=energy, size = tempo, colour = track_id)) +
  geom_point(show.legend = FALSE, alpha = 0.7) +
  labs(x = "Track Popularity",
       y = "Energy",
       title ="Track popularity vs. Energy over the years",
       subtitle = "Let's find out the trend between track popularity and enegy over the years",
       caption = "Source: Spotify data") +
  transition_reveal(year_published)
  
new_plot

We observe that there is a positive correlation between track popularity and energy values of the songs and newer songs appear to be more popular and energetic. Good time to be alive!

Reflection

Like everyone else, I love music, too. I love listening to music, I love dancing to it and I love making it as much as I can by playing my piano. So, for me, I think the hardest part of this assignment was to pick a playlist to carry out my analysis on.

While I was looking through my Spotify profile, I came across my playlist that Spotify itself created for me that consists of the songs I discovered using my Shazam app. In other words, this playlist includes the songs that I liked listening to but didn’t know the names of or completely new songs that I hadn’t heard before and wanted to discover and listen more later.

What this means is that these songs are really the cherry-on-top songs for me that I took the time to open the app and tried to get to know them because I liked them very much. Therefore, I believe this playlist is the best playlist to give me the most insightful information regarding my music taste.

With my first visualisation, what I hoped to find out was the answers to very hard questions like is there a specific artist that I listen to the most or what genre do I listen to the most or do I enjoy listening to sad songs or happy songs more, etc.

In my second visualisation, I was more curious to figure out how songs have evolved over the years, particularly with respect to their track popularity and energy scores.

Overall, it has been quite and interesting assignment for me. I loved to play around with gganimate. It wasn’t super easy, but it was a lot of fun. Of course, grammar of graphics is an ocean, and I realise I am playing in the shallow waters at the moment. However, I think we’ve made an incredible start and I’m looking forward to challenges ahead, especially scraping data from the web!

@import url('https://fonts.googleapis.com/css2?family=Palette+Mosaic&family=Red+Hat+Mono:wght@300&family=Roboto:wght@300&display=swap');

h2 {font-family: 'Red Hat Mono', monospace;}
body {
  font-family: 'Roboto', sans-serif;
}